Search Results

Documents authored by Pilz, Thomas


Document
Searching in text databases with non-standard orthography

Authors: Thomas Pilz

Published in: Dagstuhl Seminar Proceedings, Volume 6491, Digital Historical Corpora- Architecture, Annotation, and Retrieval (2007)


Abstract
In this paper we present research results of the recent project "Rule based search in text data bases with non-standard orthography". There are numerous steps involved from facsimile to searchable text-document. This paper focuses on techniques to ensure better retrieval results on historical texts with non-standard spellings. Historical documents – especially those in black letter fonts – encourage recognition errors. Adequate preparation of the image sources prior to OCR can successfully reduce the amount of misinterpretation of characters. Furthermore, the application of a search engine with categorized distance measures between user interface and text database can help to enhance retrieval results. Specific metrics cover problems in optical character recognition, transcription and historical spelling variation. With a synoptic view interface the users can be kept completely unaware of the methods applied after their queries.

Cite as

Thomas Pilz. Searching in text databases with non-standard orthography. In Digital Historical Corpora- Architecture, Annotation, and Retrieval. Dagstuhl Seminar Proceedings, Volume 6491, pp. 1-2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)


Copy BibTex To Clipboard

@InProceedings{pilz:DagSemProc.06491.14,
  author =	{Pilz, Thomas},
  title =	{{Searching in text databases with non-standard orthography}},
  booktitle =	{Digital Historical Corpora- Architecture, Annotation, and Retrieval},
  pages =	{1--2},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6491},
  editor =	{Lou Burnard and Milena Dobreva and Norbert Fuhr and Anke L\"{u}deling},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.06491.14},
  URN =		{urn:nbn:de:0030-drops-10533},
  doi =		{10.4230/DagSemProc.06491.14},
  annote =	{Keywords: Rule based search, Optical character recognition, spelling variation, edit distance}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail